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ABSTRACT 

A  VLSI  design  methodology,  built  around  the  CHiP  archi¬ 
tecture,  is  described.  The  switch  lattice  of  the  CHiP  archi¬ 
tecture  is  the  primary  design  abstraction.  The  lattice  is  a 
flexible  design  medium  with  constraints  that  mirror  those  of 
raw  silicon.  An  eight  point  pipelined  Fast  Fourier  Transform 
design,  used  as  a  running  example,  is  of  independent  interest 
for  its  locally  connected  layout. 
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Introduction 

Between  the  conception  of  a  real  time  signal  processor  and  its  func¬ 
tional  VLSI  realization  there  is  an  enormous  amount  of  effort  devoted  to 
designing,  revising,  optimizing  and  testing.  Since  the  process  is  cumula¬ 
tive  --  later  work  builds  on  previous  work  —  and  since  the  activity 
becomes  progressively  more  detailed,  more  constrained  and  more  exact¬ 
ing,  it  follows  that  the  global  desxgn  parameters  should  be  fully  explored. 
Global  design  decisions,  when  correct,  can  have  a  greater  effect  on  per¬ 
formance  than  many  local  optimizations.  When  the  decisions  are  wrong, 
they  can  cause  continual  difficulty.  Accordingly,  we  propose  a  design 
methodology  based  on  the  Configurable,  Highly  Parallel  (CHiP)  architec¬ 
ture  family  [1  j  that  focuses  on  exploring  global  design  parameters  and  is 
especially  well  suited  to  the  VLSI  implementation  of  signal  processing  sys¬ 
tems 

The  characteristic  that  distinguishes  digital  signal  processing  design 
problems  from  other  large  VLSI  design  problems,  e  g.,  microprocessor 
design,  is  that  the  former  tend  to  require  the  assembly  of  a  large  number 
of  identical  components  while  the  latter  often  require  the  assembly  of  a 
diverse  collection  of  components.  In  terms  of  the  widely  discussed 
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hierarchical  design  methodology  [2-4],  this  distinction  means  that  signal 
processors  are  characterized  by  a  shallow  hierarchy  rather  than  a  deep 
hierarchy  The  emphasis  on  decomposition  in  the  hierarchical  design 
methodology  with  its  resulting  deep  hierarchy  provides  less  leverage  for 
signa’  processing  design  problems.  Our  CHiP  computer  methodology, 
though  hierarchical,  emphasizes  the  layout  of  homogeneous  components 
and  should  provide  greater  leverage  for  signal  processor  design  situa¬ 
tions. 

The  methodology  is  not  a  cookbook  procedure.  That  is,  there  is  not  a 
sequence  of  definite  steps  which  if  followed  from  start  to  finish  result  in  a 
real  Lime  signal  processor.  Hut  there  are  steps:  the  designer  programs 
the  algorithm  for  a  CHiP  computer,  tests  it,  assesses  the  design,  revises 
it,  programs  the  subparts,  tests  them,  assesses  their  design,  revises 
them  and,  finally,  specializes  the  entire  system  for  silicon  implementa¬ 
tion. 

In  order  to  organize  our  presentation  of  the  methodology,  we  will 
develop  a  design  as  a  running  example.  Our  problem  will  be  to  design  a 
pipelined,  eight  point  Fast  Fourier  Transiorm  processor.  The  reader  need 
not  be  acquainted  with  the  FFT,  since  our  intent  is  not  to  produce  a  prac¬ 
tical  device.  Rather  we  are  using  the  problem  as  a  context  in  which  to 
focus  on  the  design  activity. 

Problem  Statement 

Naturally,  the  first  step  in  any  design  situation  is  to  understand  the 
problem  For  our  running  FFT  example  this  can  be  conveniently  ste'ed 
with  a  schematic  diagram,  (Figure  1).  Koch  processing  element  takes  two 
inputs,  B  and  ti'  and  computes  two  weighted  sums,  B  +  QB'  and  B  -  QB‘. 
(See  Stone  [5,  for  exact  details.)  Our  assumptions  arc  that  the 


processors  receive  data  bit-serially  from  off  the  chip,  that  the  structure 
is  pipelined  and  that  the  resulting  circuit  is  to  be  placed  on  a  single  chip. 
From  these  assumptions,  we  conclude  that  we  will  need  to  place  twelve 
processors  each  capable  of  multiplying  by  a  constant  and  adding,  and 
that  the  chip  will  require  sixteen  pins  for  data  in  addition  to  power, 
ground  and  any  control  lines. 


Figure  1.  Pipelined  FFT  schematic. 


Programming  the  Algorithm 

The  next  step  in  the  methodology  is  to  program  the  algorithm  for  a 
CHiP  computer.  The  purpose  is  to  establish  an  unambiguous  specification 
of  the  problem  and  to  begin  initial  exploration  of  the  layout,  timing  and 
input/output  constraints.  Before  programming  our  FFT  example,  we 
must  introduce  CHiP  machines. 

A  CHiP  computer  is  one  of  a  family  of  architectures  specialized  for 

"fine-grained"  parallelism  and  efficient  VLSI  implementation.  The  main 

component  of  the  architecture  (and  the  only  one  of  interest  here*)  is  the 

svjitch  Lattice  This  is  a  homogeneous  array  of  programmable  switches 

•Other,  more  thorough  descriptions  of  the  CHiP  machine  have  been  given,  but 
they  focus  on  its  use-  as  a  general  purpose  parallel  processor  [1,  6j  Our  descrip¬ 
tion  here  has  been  specialized  to  its  use  as  a  design  abstraction. 
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and  data  paths  with  processing  elements  (PEs)  placed  at  regular  inter¬ 
vals.  Figure  2  illustrates  schematic  diagrams  of  two  switch  .eitices  The 
switches  and  data  paths  arc  a  general  means  of  specifying  information 
flow  and  the  processing  elements  serve  to  represent  some  arbitrary  com¬ 
putational  activity. 


Figure  2.  Two  switch  lattices.  Circles  are  switches;  squares  are  process¬ 
ing  elements. 


Ultimately,  when  the  methodology  has  been  worked  through  and  the 
design  jo  completed,  the  switches  and  data  paths  will  have  been  removed, 
the  active  data  paths  will  have  been  replaced  by  wires  and  the  processing 
clemcrus  will  have  been  replaced  by  specialized  circuits  for  the  particu¬ 
lar  function.  But  at  this  point  this  stylized  representation  of  the  com¬ 
ponents  gives  the  designer  a  simple,  flexible  means  of  simulating  the 
algorithm.  The  simplicity  and  flexibility  make  the  revision  a  less  painful 
process  and  encourage  exploration  and  experimentation 


As  Figure  2  illustrates,  switch  lattices  differ  in  several  respects. 
Although  the  designer  will  choose  a  lattice  suitable  for  the  particular 
algorithm,  it  is  appropriate  to  mention  the  axes  of  variability.  1'he 
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degree,  d,  of  switches  and  processing  elements  refers  to  the  number  of 
data  paths  incident  to  the  device.  Normally,  we  will  have  d  =  6  for  switches 
and  PEs  although  a  higher  degree  for  PEs  may  be  convenient  wnen  there 
are  multiple  inputs  and  outputs.  (See  below.)  In  Figure  2(a).  d  =  6  and  m 
Figure  12(b),  d-A. 

The  corridor  width  u.  refers  to  the  number  of  switches  separating  two 
neighboring  PEs.  (In  Figure  2(a),  vu~\,  in  2(b),  iv-2.)  The  more  distinct 
data  paths  that  must  pass  between  two  processing  elements,  the  wader 
the  corridor  width  must  be.  Since  the  switches  will  ultimately  be 
removed,  there  is  no  harm  in  specifying  a  large  corridor  width  However, 
b}’  calling  explicit  attention  to  corridor  width,  we  cause  the  designer  to 
focus  on  data  routing  and  to  appreciate  the  consequences  of  haphazard 
routing  on  density  and  packing.  Notice  that  the  corridor  width  is  related 
to  the  number  of  distinct  data  paths  passing  between  two  PEs,  not  to  the 
number  of  wires  in  each  data  path  (which  is  set  later). 

One  programs  the  switch  lattice  simulator  by  giving  ’’configuration 
settings"  for  the  switches  and  program  text  for  the  PEs.  A  configuration 
setting  specifies  which  of  the  incident  data  paths  a  switch  is  to  connect. 
If  no  configuration  setting  is  given  the  data  paths  are  isolated.  In  the 
figures  we  simply  draw  lines  through  switches  to  specify  active  settings. 
The  program  lext  is  given  in  a  conventional  sequential  programming 
language  that  has  been  extended  with  facilities  to  specify  timing.* 

Returning  to  our  EFT  example,  we  can  specify  our  first  embedding. 
Figure  .'S  ilhn-lral  os  a  direct  embedding  of  the  EFT  interconnection  (Fig¬ 
ure  1)  in  a  switch  lattice  where  w~2  and  d=ti.  because  of  the  number  of 
data  paths  crossing  from  the  upper  half  of  the  layout  to  the  lower  half,  a 
•For  the  Blue  CM  it*  Project's  pilot  simulator,  the  language  is  Pascal 
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widtn  w=2  is  required.  Notice  that  the  layout  is  the  same  for  each  of  the 
three  files. 


Figure  3.  Switch  lattice  embedding  of  the  FFT. 

The  execution  of  the  CHiP  computer  is  synchronous,  so  the  develop¬ 
ment  of  the  PE  code  is  a  simple  matter.  Each  PE  executes  a  variant  of 

L  READ  B,  READ  B' 

C  <-  B  +  QB' 

C'  *-  B  -  QB1 
WRITE  C,  WRITE  C 
GOTO  L 

where  the  variant  is  determined  by  which  PE  ports  the  variable  comes 
from  or  goes  to.  For  example,  PE  1.1  would  execute 

L-  READ  B  FROM  West ;  READ  B‘  FROM  Southwest 
C  •-  B  +  QB' 

C  <  B  -  QB' 

WRITE  C  TO  East  ;  WRITE  C'  TO  Southeast 
GOTO  /. 

PEs  with  degree  greater  than  eight  have  their  ports  numbered. 

Although  the  development  of  the  program  is  the  responsibility  of  the 


designer,  there  are  library  embeddings  available  that  embody  careful 
analysis  and  research. 
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Assessment  and  Revision 

The  next  activity  in  the  methodology  is  to  assess  the  initial  design 
and  make  appropriate  improvements.  The  goal  here  is  to  evaluate  Low 
the  design  can  be  globally  improved  before  investing  any  effort  in  tnc 
detailed  layout  Obviously,  this  activity  will  require  a  certain  amount  of 
judgement  and  experience. 

Our  FFT  has  several  favorable  characteristics.  It  has  a  nearly  square 
aspect  ratio  (4  3)  and  has  edge-to-edge  data  flow.  The  latter  property  is 
important  in  order  to  reach  the  bonding  pads  which  are  most  con¬ 
veniently  located  on  the  perimeter.  The  main  liability  of  our  initial 
design  is  the  nonlocal  data  flow,  i.e..  the  presence  of  long  data  paths. 
When  the  design  is  laid  out,  some  wires  will  have  to  be  as  long  as  the  side 
of  a  PE. 

To  solve  this  long  data  path  problem,  we  observe  that  to  achieve 
edge-to-edge  data  flow,  it  is  not  necessary  for  the  flow  to  be  unidirec¬ 
tional  as  it  a-  in  our  initial  design.  In  particular,  an  alternative  strategy  is 
to  route  the  data  towards  the  center  of  the  layout  and  then  back  out 
towards  the  perimeter.  To  achieve  such  an  in-and-out  data  flow,  we  place 
the  second  flic  (2  x)  of  processing  elements  in  the  center  of  the  layout 
and  place  tnc  first  and  third  files  around  the  edge.  Figure  4  illustrates 
this  layout.  The  resuit  is  a  design  which  still  has  edge-to-edge  data  flow 
and  siiort,  local  connections.  (This  particular  optimization  may  not  gen¬ 
eralize  for  larger  shuffle  graph  problems,  but  the  concept  of  in-and-out. 
cduc-to-cdge  data  flow  could  have  wide  application.) 

The  assessment  and  revision  activity  is  iterated. 

In  the  second  design  the  aspect  ratio  is  now  square  —  a  minor 
improvement.  Unfortunately,  the  corners  of  the  layout  are  unused.  This 
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area  can  be  used  for  bonding  pads  for  the  input/output  wires  of  the  adja¬ 
cent  PEs  It  could  also  be  used  for  other  logic  depending  on  how  the 
design  develops  (bee  below.  ) 

When  studying  the  way  data  enters  and  leaves  the  PEs  m  Figure  4, 
one  sees  that  there  are  two  different  processing  element  geometries  The 
external  PEs  are  alike  and  the  internal  PEs  are  alike*.  It  is  obviously 
undesirable  to  have  to  require  two  designs  for  the  same  function,  so  we 
reprogram  the  external  switches  to  convert  the  external  PE  geometries 
to  the  internal  form  (See  Figure  b.)  This  gives  one  layout  form  Furth¬ 
ermore,  if  we  reflect  on  how  this  stylized  diagram  will  finally  be  imple¬ 
mented,  it  is  clear  that  since  the  two  data  paths  will  probably  exit  from  a 
PE  together,  the  global  data  flow  will  be  optimized  if  (1)  they  enter  the  PE 
together  and  (2)  these  entry  and  exit  points  are  on  opposite  corners  of  a 
PE.  We  take  these  two  conditions  as  constraints  to  be  carried  over  to  the 
next  phase  of  design.  If  we  can  accomplish  these  two  in  the  next  phase 
we  will  have  a  belter  global  organization.  If  we  cannot,  we  return  to  this 
point  to  reassess  and  revise. 

Hound  Two 

The  process  of  programming  the  CHiP  machine  has  resulted  in  an 
unambiguous  specification  of  the  algorithm,  a  routing  of  the  data  flow,  a 
global  ,ayoul  and,  presumably,  the  development  of  some  test  data  that 
was  used  when  the  algorithm  was  run  on  the  CHiP  architecture  simulator. 
But  Lius  first  program  is  not  intended  to  specify  the  algorithm  m  great 
enough  detail  for  direct  VLSI  design  and  layout.  In  particular,  the  func¬ 
tional  activity  of  the  PEs  is  probably  too  complicated  al  this  early  stage 

•Iff  internal  1’Ks  are  rot  quite  alike  —  the  (clockwise)  meaning  of  the  data  paths 
differs  among  them.  This  will  be  easily  corrected  later  by  a  simple  wire  crossover 
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Figure  4.  A  reviced  .-FT  embedding  with  local  communication. 


Figure  5.  A  reprogramming  of  perimeter  interconnections. 

In  our  FIT  example,  the  inner  product  step  is  such  a  complicated 
activity. 

bo  the  methodology  dictates  that  we  iterate  the  progrnm-assess- 
revise  cvcle  until  the  functional  activity  of  the  PEs  is  sufficients  simple 
to  be  directly  implemented  in  VLSI  or  can  be  implemented  ov  an  avail¬ 
able  library  layout  Since  the  interconnection  and  global  lavom  .ire  now 


-  io  - 


fixed,  it  is  necessary  only  to  implement  the  specified  activity  o:  the  PF 
inis  accomplished  by  programming  the  CiiiP  architecture  to  imple¬ 
ment  ti.e  algorithm  specified  bv  the  PK  code(s).  !t  is  ti-.s  .lerai.vr 
activity  that  gives  the  methodology  its  hierarchical  capami.'y 

During  each  subsequent  round  of  prosramrning-assessmer.t-rvvis.or.. 
it  is  important  to  establish  that  the  current  CHIP  program  correctly 
implements  the  specification  of  the  previous  level  This  :>  a  r<  c  ..moment 
of  any  top-down  design  effort,  and  it  is  aided  acre  by  the  previously 
developed  test  data.  (Notice  that  the  test  data  may  have  to  have  .is  form 
changed  to  reflect  the  changed  level  of  detail  For  example,  at  one  level 
the  program  can  be  simulated  on  words  of  data  while  at  the  next  level  it 
might  require  bit-serial  data.) 

We  return  briefly  to  the  FFT  example  to  give  a  second  level  of  layout. 
Postulate  a  linear  array  of  PEs  to  perform  the  inner  product  step  based 
on  a  pipelined  multiplier  [7_.  The  layout  will  have  two  serial  inputs,  B  and 
B,  and  will  produce  two  serial  outputs,  B  +  QB'  and  B  -  QB  .  The 
coefficient,  Q.  will  be  stored  internally  to  the  layout,  although  u  will  be 
shifted  through  to  form  the  intermediate  products.  By  our  analysis  from 
the  previous  level,  the  current  layout  will  receive  its  input  at  one  corner 
and  must  deliver  its  output  to  the  opposite  corner.  This  suggests  a 
''snaked"  arrangement  for  the  linear  array  of  processing  elements.  (See 
Figure  6.)  Each  PE  has  a  input  and  output  the  three  data  values  as  well 
as  the  partial  product.  The  B  value  is  carried  along  to  be  available  at  the 
end  for  summing  and  differencing  in  the  last  cell.  The  control  lines  could 
either  be  broadcast  or  transmitted  sequentially  [7]  from  the  control  cir¬ 
cuit  that  we  will  place  in  the  corners  of  the  global  design. 

As  before  we  should  next  program  the  activity  of  the  PEs.  This  time 
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Figure  6.  A  pipelined  inner  product  layout 
there  will  be  a  lew  different  cells  since  the  multiplier  requires  a  few  [7' 
and  there  is  a  sum/aifference  cell.  Then  we  embark  on  another  sequence 
of  assess  and  revise  iterations.  Having  illustrated  how  the  "opposite 
corner"  data  flow  property  established  at  the  top  level  becomes  a  con¬ 
straint  to  be  implemented  at  the  second  level,  we  forego  further  detailed 
design. 


Design  Specialization 

The  program-ussess-revise  cycle  continues  until  the  processing  per¬ 
formed  at  each  PE  can  be  directly  implemented  as  a  VLSI  design.  The 
needed  cells  are  either  produced  or  acquired  from  a  library.  Then  the 
design  is  specialized.  That  is,  the  VLSI  designs  replace  the  PEs  in  the  last 
CIIiF  program  layout.  The  active  data  paths  are  replaced  by  wires  and  all 
of  the  switches  are  removed.  This  result  is  then  used  to  specialize  the 
next  higher  level  program,  i.e.,  it  replaces  the  PEs  in  its  predecessor  lay¬ 
out,  etc.  When  the  activity  is  completed,  our  stylized  CIliP  lattice  is  gone 
and  what  remains  is  a  completed  VLSI  design.  For  our  example,  see  Lhe 


schematic  in  Figure  7, 


1  nO&O]  rC^yz-r 


Figure  7.  The  specialized  layout;  the  wide  spacing  is  for  showing  inter¬ 
connections. 

Although  it  is  straightforward,  the  specialization  process  is  not  quite 
as  trivial  as  just  suggested.  Its  success  depends  on  several  conditions. 
First,  the  aspect  ratios  and  cell  sizes  must  be  properly  controlled  during 
the  design  process  in  order  to  pack  the  cells  easily.  This  condition  is 
easily  met  as  long  as  the  PEs  perform  closely  related  operations  In  our 
running  example,  the  top  level  cells  were  identical;  the  second  level  cells 
were  sufficiently  similar  to  justify  an  assumption  of  equal  size. 

Another  complication  for  specialization  is  power  and  ground  routing. 
We  recommend  the  following  strategy.  Perform  the  routing  prior  to  spe¬ 
cialization  but  after  all  the  VLSI  cells  are  designed.  At  that  point  it  is 
known,  relatively,  where  power  and  ground  enter  the  cells  Then,  route 
the  power  and  ground  wires  within  each  CHiP  lattice  layout  starting  at 
the  top  level.  This  permits  a  convenient  top-down  routing  with  the  added 
advantage  of  knowing  the  target  sites  for  the  bottom  level  connections. 

A  word  about  simulation.  As  the  program-assess-revise  cycle  is  per- 
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formed,  each  program  can  he  simulated  in  isolation  using  the  data  (possi¬ 
bly  revised)  from  the  previous  level.  Moreover,  the  coinposit  design  can 
be  simulated  at  each  cycle  by  logically  substituting  the  programs  of  each 
level  for  the  PLs  of  the  previous  level.  Once  the  PEs  have  been  rcpjaeec 
by  VLSI  cells,  however,  it  is  unclear  to  wriat  extent  the  design  method  o.- 
ogy  can  assist  in  efficient  simulation,  it  is  obviously  compatible  w  :;h 
hierarchically-based  VLSI  design  rule  checking  [8]  and  electrical  mtegiity 
cliccking  [9  . 

Summary  and  Discussion 

The  methodology  we  have  presented  focuses  on  global  design  issues 
of  a  VLSI  implementation  -  data  flow,  functional  decomposition,  geometric 
layout  of  components.  If  we  use  ’  +  ’  to  denote  ’one  or  more  applications 
of’,  then  the  CHiP  architecture  methodology  could  be  described  as 

(program  ,  test  (assess  ,  revise  ,  test  )*)*  specialize 

This  methodology  leads  to  a  design  with  a  shallow  hierarchy,  making  it 
most  effective  for  highly  regular  algorithms  such  as  digital  signal  pro¬ 
cessing  systems. 

The  CHiP  architecture  is  crucial  to  the  methodology.  The  switch  lat¬ 
tice  provides  a  medium  that  mirrors  raw  silicon:  it  is  planar;  it  has 
integrated  processing  and  interconnection  facilities;  it  is  described 
geometrically;  external  data  is  available  only  at  the  perimeter.  Conse¬ 
quently,  programming  an  algorithm  for  a  CHiP  architecture,  though  rea¬ 
sonably  convenient,  gives  a  good  approximation  to  a  VLSI  layout. 

It  is  this  feature,  a  convenient  programming  abstraction  imposing 
VLSI-iikc  constraints,  that  perhaps  most  distinguishes  the  CHiP  metho¬ 
dology  from  others  in  which  the  specification  form  is  divorced  from  the 


technology. 


Related  Results 

There  are  three  points  to  be  made  about  related  research. 

First,  from  our  study  of  configuration  settings  we  have  developed  a 
library  of  efficient  embeddings  for  commonly  used  interconnection  struc¬ 
tures.  These  include  single  corridor,  planar,  linear  area  binary  trees  [l, 
10,  toruses  with  no  long  data  paths  [10],  shuffle-exchange  graphs  with 
narrow  corridors,  etc.  For  example,  Figure  8  shows  a  64  node  shuffle- 
exchange  graph  embedded  in  a  lattice  with  iu  =  i  and  d- 8.  This  embed¬ 
ding,  due  to  Paul  Morrissett  [11  is  of  interest  because,  in  general,  the 
shuffle-exchange  graph  requires  very  wide  corridors  [6].  In  addition, 
there  are  general  emoedding  techniques  known  for  common  layout  prob¬ 
lems:  the  Aleliunas-Rosenberg  technique  for  bending  data  paths  around 
corners  [l],  and  lacing  for  maximizing  the  number  of  data  paths  through 
a  region  of  the  graph  [10]. 


Figure  8.  A  64  node  shuffle-exchange  graph. 

Second,  wc  have?  developed  another  methodology,  called  Processor 
Displacement,  that  assists  the  designer  in  balancing  pin  limitations  with 
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chip  area  utilization  [12\  This  approach  to  determining  the  optimal 
amount  of  multiplexing  is  compatible  with  the  CHiP  architecture  metho¬ 
dology  described  here. 

Third,  the  CHiP  computer  is  intended  to  be  a  general  purpose  paral¬ 
lel  processor  and  as  such  it  physically  implements  a  switch  iattice  with 
programmable  switches  and  microprocessors  as  processing  elements  [1  . 
Were  CHiP  computers  generally  available,  a  signal  processing  system 
could  be  buiit  simply  by  running  the  top  level  program  of  our  methodol¬ 
ogy.  This  solution  to  constructing  a  special  purpose  signal  processor 
probably  would  not  have  sufficiently  good  performance  to  serve  most 
applications.  Although  easily  accomplished,  this  would  be  too  general  a 
solution  for  a  high  performance  device.  Our  methodology  on  the  other 
hand  can  lead  to  high  performance  but  requires  much  effort.  There  could 
be  a  compromise  solution:  We  are  exploring  the  possibility  of  sermspe- 
cialized  CHiP  computer  which  would  replace  the  general  purpose 
microprocessor  PEs  with  functional  units  tailored  to  a  specific  applica¬ 
tion.  CORDIC  processors  are  good  candidates  for  these  specialized  PEs 
[I'd.. 
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Type  lattice  for  the  "In-Out  Bond"  lattice  class;  origin  unknown. 


